 Welcome, everybody, to Cryo, the runtime control room. I'm Sasha, one of the maintainers of Cryo, and today I have the pleasure to be here with Orwashi. Hello, everyone. My name is Orwashi Munani. I'm a software engineer at Red Hat, working in the container runtime space, and I'm a maintainer of Cryo, and I'm here today with Peter. Hey, there. My name is Peter Hun. I'm a software engineer at Red Hat. I'm really working on Cryo, Bodman, and Conman. And I'm also here with Renal. Hi, everyone. I'm Renal Patel. I'm also a software engineer at Red Hat, and I'm a Cryo maintainer. Welcome, everyone. So what is Cryo? Cryo is a container runtime that implements the Kubernetes container runtime interface using open container initiative images and runtimes. It means that Cryo supports pooling OCI-compatible images and running them with OCI-compatible runtimes such as Run-C or C-Run. The projects that make up Cryo, such as Container's Image, Storage, Podman, and Run-C are continually pushing the envelope of container technology and Linux as they keep integrating new features added to the kernel. Cryo balances stability for the core CRI features that Kubernetes needs while adding knobs to improve security as well as incubate new features. This talk is about how a Cryo admin sitting in the control room, so to speak, is able to configure Cryo feature to make your clusters more secure out of the box as well as provide knobs for you to try out new features such as username spaces. We will cover all the different ways that Cryo and the workloads running on Cryo can be configured. So without any more delay, let's begin. All right, I would like to talk to you about the basic configurations of Cryo. So most people will look at etccryocryo.conf, which is the main configuration file, and it's written in Tommel. So every configurable part of Cryo can be set there. Everything we can adjust, also via the command line interface can be done in this configuration file. So for example, we can specify the storage driver and the storage route. We can also set which underlying OCI runtime should be chosen. For example, we can switch from Run-C to C run. We can also add up some security related configurations like setting the default second profile and the default app armor profile or the default capabilities, which should apply to all workloads. And we also have the possibility to have some more debugging helpers like setting the lock level or the lock level filter. Right now we are working on making those configurations dynamically reloadable. And we are introducing a modular configuration approach, which allows us to separate the single configuration file into multiple ones. So how does this work? So Cryo supports dynamic configuration, which works that we just have to send the SIG Hangup to the server. And then the server will reload the options. So this works for features like the default app armor profile or the second profile. And it also works for the lock level and the lock level filter. If we configure system D to automatically send SIG Hangup on reload, then we can just run system CTL cryo reload. So for the drop-in configurations, and per default, we defined a new directory, which is an etccryo cryo.conf.d. And for example, if we specify the tumultable like cryo.runtime, then we can set the lock level and all write it to debug. So those configuration files are in alphabetically order-processed, so it makes them easy to use if you prefix them with a number, for example. And they also work with the dynamic configuration reload feature. And those drop-in configurations have a higher priority than the etccryo cryo.conf, which gives administrators the possibility to deploy a default configuration. And users can override those configurations by just having their drop-in configurations. And now I would like to demo that to you. So what we can see here, I will now start cryo with a configuration file in my local directory. And I will unset the configuration directory, because on my local machine, I already have etccryo.conf.d available. And to not have any effect on my demonstration, I will just put it to an empty string. So here we are. Cryo is loaded and seems to work. So we can now run something like cryctlps. And yeah, we get an empty response because nothing is running. And cryo.pl default does not lock that verbose. So the info verbosity does not lock that much at all. We can see we are probably more or less bound to warnings or info messages which are related to the CNI network. So to change that, we can go into our cryo configuration and go to the lock level and set it to debug. Then we save it. And my demo, I will now restart cryo. And here we can see that we are now in debugging mode. And if we now run any RPC request, then we can see that every request is locked. So those lock files can be pretty huge in the end, because the verbosity is now really, really high. And to have a better debugging possibility, we added something like a lock level filter. So for example, we now can go into our cryo configuration and change the lock level. This is a regular expression. It works. For example, we can change it as case insensitive regular expression and go for request. And now let's just, we don't have to restart cryo at all. We can just reload cryo. We are sending a hang up like this. Cryo also gives us the indicator that we reloaded the configuration. So it reloads the configuration. It updates the configuration file. And we can also see, OK, the lock filter is now set to request. And if we now run cry CTL, then we can see that we only really lock the requests. And this way, administrators can, for example, really good filter on those lock messages. What can we do? For the drop-in configuration files, we can also start cryo by changing the configuration file directory to my local directory, and then cryo comes to D. And if we do that, then we can see that in this directory, for example, there's a lock file, a lock modification file, which resets the lock filter. So in the configuration file, we change the lock level filter to lock only the requests. And in this override file, we just unset it again. And if we now run cry CTL, then we can see that we get all data, not only the requests. We also get the responses. For example, here we have to list image response with lists all the images available on my local machine. And that's it for my little demo. And now I would like to hand over back to Orochi. So cryo uses the container storage library to manage copy and write file systems for containers and the containers image library for pulling images from container registries. These libraries are also shared across our other container tools like Fondant, Buildup, and Scopeo. So having a shared back-end means that we can also share configuration files. So to give an example, let's say I want to block image posts from a specific registry, sensitive having to configure this for each container tool that I use, I can just put it in a shared file that all the container tools can access. So we have three shared files like this. The first one is the registries.conf file, which is used to configure anything related to your registries. So users can go there to configure the insecure registries, block registries, your list of unqualified registries, as well as mirror setup. The second file is the policy.json file. This file holds policy requirements for container image, specifically around where you're pulling the image from. So you can have policy requirements for various transports, including trusted keys and image signatures. The third file is the storage.conf file. This is used for configuring options for your container storage, stuff like your driver, your run route, your graph route, et cetera. And now we're going to talk about container and networking configuration. So Cry uses CNIF for configuring networking for Kubernetes ports. CNIF is widely adopted as a networking solution in Kubernetes. So CNIF configuration mainly consists of two settings. One is the part to the directory in which the configuration is stored. And the second is where the plug-in binaries are stored. So you can see in the example of the slide, we have a network dir and a plug-in dirs option for both of these settings. In addition, we also support setting a default network if you intend to do so. And then Cry also supports bootstrapping networking to demon sets. So you can have a demon set that copies over the configuration and the binaries to these directories, and Cry will pick up the configuration. So any ports that start up after that configuration will pick up the CNIF settings. In addition to all of these options that in the Cryo, Cryo also has the ability to edit the runtime process and handle it. So in Kubernetes, different workloads have different needs of performance and security. And their users need a way to toggle between different runtimes. So Kubernetes has this feature runtime classes, which is slated to be GA in 1.20, which asks the CRI implementation to use a different runtime. Or users can also make it use runtimes differently. Admins then have the ability to create runtime classes and optionally add emission killers or policies to gate them for different users. This gives a lot of verbosity for the amount of ways that admins can configure the way that their users can use different runtimes for different workloads. So in the bottom left here, you can see that this is a basic runtime class example. And the runtime class name is corresponded with one of the runtimes in the runtimes table. So perhaps the runtime class name would be RunC high performance and then a pod would. And then so and we'll go into a little bit more in detail. On the right, we have an example of a couple of different ways that you can configure Cryo to have different runtimes. And we'll go into a bit more in detail. We have the generic RunC. We have a RunC that configures high performance for CPU balancing. We have RunC that allows user NS annotation. And then we have Cata, which provides a bit more security as it's a kernel separated container. In addition to all of the ways that you can configure your crowd, there's also the pod annotation, which is a pretty generic feature generally known in Kubernetes, but I'm just going to go over it. It's a generic key value map in the pod metadata. And it allows for a passing through of unstructured data to varying levels of the stack. What this allows us to do is Cryo has some specific annotation user NS mode annotation and Gensai's annotation. And these annotations allow admins to use this to request differing configurations of the runtime. And then Cryo can use those annotations to configure the runtime as long as that user is allowed to do that. And we'll go into a user NS mode annotation example a little bit, but first in summary admins have the ability to configure Cryo in a varying number of ways, storage image networking, all of these varying options. In addition to that, they have the ability to add runtime classes to restrict varying behavior that Cryo configures the runtime to do. And in addition to that, they have the ability to make different controllers, gave runtime classes or annotations before they reach Cryo, which gives the ability for admins to have a lot of power in configuring their Kubernetes nodes while they're varying use cases and workloads. So now we're going to have the first example of that and we're going to have her to talk about the runtime class pathology. Yeah, thanks for that. So we can use the runtime class topology as well as pod annotations to disable CPU load balancing for searching containers that are using high performance runtime. The way this is done is that a pre-start hook is run to disable load balancing on the CPU specified in the pod spec. And then a pre-stop hook is run to enable load balancing on those CPUs once the pod is about to be stopped. So this is helpful in workloads where the expense of context switching cannot be tolerated. So the user, the admin or the user gets the privilege to disable CPU load balancing on certain CPUs. So I have a quick demo on how this is done. Okay, so in this demo here, I have an open shift cluster. So the first thing that I did with my cluster was I created a machine config object that dropped in a config file at cryo.conf.d called 99one-times.conf.conf. So in this config file, I am configuring my runtime class, my high performance runtime class. So that was just the basis for content. So we can go in the node now to see what the file actually looks like under the cryo.conf.d folder. And we can see here that I have created a runtime called high performance here. This is important because we will be creating our one-time class using OC and cryo needs to know that there is a high performance runtime that exists so it knows to run the pre-struct and pre-stop code. So since we're already on the node, we can check the value of the flags for the various CPUs that I have on this node. This is important to keep in mind what the value here is. So 4783 and 4655 because once we disable load balancing on the CPU, the value will drop by one. The other thing that I did was I created a Kubelik config CR to update the CPU manager policy to static. So this allows me to request whole CPUs on certain nodes exclusively. So if you can see the kubelik.conf file here, when you scroll down right here, I don't even see my cursor, but it was set to static. So now the next thing is I created a runtime class, a high performance runtime class, the handle name with high performance and I called it hyper test. This is just a quick description of what the runtime class looks like. So now we are ready to create our pod where we would like to disable CPU load balancing. So here for the pod, I have the set the CPU load balancing cryo, IO annotation to true. I have set the runtime class name to the hyper test runtime class that it created and I have requested two CPUs under my resources. So the pod is up and running now. Let's figure out which node it's running on so we can go on the node to verify that CPU load balancing was actually disabled on the two CPUs that the container got. So first thing we have to find is the seagrub slice so we can figure out the CPUs that it is using. So I'm just getting my container ID and then finding the CPUs that seagrub and we got the slice. And we can see here that the CPUs that are CPUs one and three those are the CPUs I got. So we can check the value of the flag for these two CPUs now. And here we can see that the values dropped down by one. It is 4782465 for now for both CPU one and CPU three. We can just take a look at CPU two, for example, to see what the double check the values that that one is still enabled and we didn't accidentally disable that. So as you can see, the values are greater than the disabled one. So we successfully disabled CPU load balancing on the two CPUs that the pod got. So now once the pod is done running we don't leave the CPU load balancing disabled. We re-enable it with the pre-stop hook as I mentioned before. So let's quickly delete the pod here and we'll go back on the node and check the same flag value again. So yep, and then we're going to cast that value for CPU one and three. And yep, as you can see, it went back up. So load balancing on these two CPUs are enabled again. And that's all I have for my demo. All right, let's talk a bit about security related configurations. Usually the container runtime interface defines the main behavior. How a container runtime which implements the Kubernetes API should behave. But this behavior is probably not always the most secure one. So for example, if we have a specify a second profile and put it into an empty string then this will be considered as unconfined. And this applies to basically all workloads in the cluster who do not use the runtime default second profile or a local host profile. So therefore we decided to add a new configuration option which is called second use default when empty. And this should help us to increase the security defaults and running on container runtimes. So turning this option on with the default use the runtime default second profile with chips with cryo to all workloads who do not really specify something like unconfined or a local host profile. And I prepare the demo for you to show it to you. So first of all, let's have a look at this new configuration option. So it's part of cryo.conf and we can just have a look into the description of the option and per default it's set to false. So if we now run cryo and for our demonstration purposes we will unset the configuration directory and then create a Kubernetes workload. And then we can see that ZECCOM is not enabled on this workload. So we can just verify it by using proxy status and ZECCOM points to zero. So it's not enabled at all. So if we now create a drop-in configuration which enables the features for us then this would look like that that we just have cryo.conf.d and we just override the second use default when empty to true. If we now run cryo with that configuration directory and create a new Kubernetes workload then we can verify now in the same way that ZECCOM is enabled for that workload. And this is a great enhancement to default security and is a good example of strong security defaults from my perspective. That's it for my demo. Thanks Sasha. So next we're gonna talk a little bit about username spaces in Kubernetes and cryo. So real quick, I'm just gonna do a quick overview of username spaces for people who aren't aware. You can think of users in groups as people and the available range of IDs, all of them on the host is like a house. The username space is people living not in the full house but in the little doll house within the house as in a subset of the range of IDs. Without going too far into it, there are quite a few security advantages to username spaces. While inside the container, the process can think it's privileged and be given privileges from the perspective of the host like a process in a container can mount because it thinks that it's UID zero or root. But on the outside, it's actually an unprivileged process and if a container was able to break out, then it wouldn't actually be able to do anything on the host. So it's much more secure to be able to run any positive any kind of privilege inside of the username space. So username spaces in Kubernetes has been a really long time issue and original enhancement request in 2016 was brought up about it and it's been attempted to be implemented varying times throughout the last four years. And yet still again, we have another cap in 2020 that describes how we may be able to implement username spaces again. So that's the sad news. The happy news is that much like PID limit which was an option that Cryo introduced way at the beginning of Cryo to prevent fork bombs from doffing a system. And that wasn't supported in Kubernetes until 119 I believe or 118 somewhere around then. Cryo has added support for username spaces before upstream cube has solidified on the implementation and this allows admins to play around with username spaces and use it but we make sure to implement it in a way that's secure. So by what we did is we configure allowed so there's a username space annotation IO Kubernetes cryo username S mode and only pods that run in certain runtime classes are allowed to interpret that annotation to actually be able to create a username space for the pod. And that allows admins to stop anyone from creating username space if they don't specify runtime handler with the username space or only allow some people to use username spaces to admission controllers policy of runtime classes and then give anyone access to username spaces if they have a lot of indications in all of their runtime classes. So I'm going to do a quick demo on that. All right, so we're gonna start off by just creating a vanilla Kubernetes cluster running Cryo in it. And while this bootstraps I'm gonna show a little bit more about what we're gonna be working with. So we're gonna start off with our, this is our runtime class for the user in S. So notice we have the name run to user in S and our allowed annotations is IO Kubernetes cryo username S mode. So that means that this username space is allowed to interpret the IO Kubernetes cryo username S mode to configure username space for newly created pod. Looking forward a little bit then we're gonna look at our three queridic objects that we're gonna create. So first we're gonna create a runtime class and that's going to use the handler run to user in S and it's gonna be called run user in S class and any pod that wants to be able to have the user in S mode interpreted needs to use that class. We're gonna then create a pod user in S pod and that is going to have the auto late annotation for use in S mode. Auto just means ask the cryo to create a 65 K sized user namespace. Nothing super fancy, but it's sufficient for the majority of use cases and it allows cryo to do that kind of delegation. There are finer grain controls but we're not gonna go over that in this demo. And this container is just gonna sleep for a day and run a container and it's gonna use the runtime class user in S class. So it'll actually be allowed to do that. Then we're gonna create also a not user in S pod. This one is going to attempt to get the annotation auto but it's actually not going to work because cryo because it isn't configured with the correct runtime class name. So cryo will refuse to interpret the annotation. So it'll just be given to default user namespace. This allows the admins to have finer grain control. It's going to sleep for two days and that'll allow us to differentiate the two in PS3. So now we're going to wait for this cluster to come up. All right. And so now we're going to create our three pod objects and we'll check that they're running, which they are. So now we're going to look at the way to check that a user namespace is running is check the UID map of the of the PID. So we're gonna find the PID of each of them. So remember, one day is our user NS pod and two days is our not user NS pod. So we're first gonna look at our user NS pod and we're going to have the UID map of that pod. Now note here, we have, so the way that this file is structured is it has, so this is the beginning of the UID range and that's the end of the UID range or the beginning of the UID range with the respect to the container's user namespace and this is the beginning of the UID range from the respect of the host user namespace. So the container thinks that it's running in UID zero but actually it's running in UID 20,000 and that allows, and then there are 65K allocated IDs which should be enough for our two PID pod. So this means that any pod, any process may think that it's root inside of the container and be able to do things with elevated privileges from inside the confines of the container but from the perspective of the host, it's just some random UID and it can't do anything fun. So if it happens to break out, goodness for good, then it won't really be able to do anything that from its elevated capabilities from within the user namespace. And here just to demonstrate that it's gated on the runtime class, we're going to show the not user in this pod. So here we see from the perspective of the user namespace it's running as root or it's UID map begins as root and from the host begins as root and this is just all of the available UIDs. So basically what's happening is this pod is running in the host user namespace. So if it is specified to run as a, if the PID one in the container is running as root and it's actual root on the host and any privileges it has is part of the privileges from the perspective of the host. And notice how even though we attempted to give the annotation to the pod, it didn't work. It wasn't given the annotation. Or it wasn't, the annotation wasn't interpreted. So the user namespace wasn't created and that meant an admin stopped some user from creating user namespace when they weren't supposed to which demonstrates fine-grained control. And that is for the demo. Thank you for that awesome demo Peter. So Cryo currently is an incubating project in the CNCF. The most recent version of Cryo is 119 and we continue to walk in lockstep with the Kubernetes version. Cryo is stable and it is the container one time that is being used by Susiecaps and OpenShift 4x clusters and production. So what's in the future of Cryo? As mentioned before, Cryo already has support for user namespaces as well as C-groups V2 and we are working on pushing these features and option Kubernetes as well so that clusters can take advantage of it. We are also moving some of Cryo components over to Rust to improve performance and we see Cryo graduating in the near future. If you wanna find out more about Cryo, here are some resources for you. We are available on Slack and IRC and we have this awesome.md doc on our repo which has links to past talks as well as any articles and resources around Cryo. And we also have a pretty cool coloring book that talks about the different container tools and how they work together. I think now we can open the floor to question. Thank you.