 So, hey guys, my name is Antonio. I'm usually known as Runcome Online, so you might know me in case you don't. I'm an engineer within Red Hat. I work for the runtime teams with Dan Walsh, because everyone knows him, right? And so I'm working on the runtime teams, and I'm one of the leads for the Cryo project upstream. And today I'm here to actually tell you something I'm really proud of, and I'm always bragging about, which is how we're developing Cryo after we made some promises to the Kubernetes upstream repo and project that we would never break down because we wanted to be the default runtime. So, I guess I'll briefly introduce Cryo for if you don't know it. So, Cryo started because three years ago, more or less, the Kubernetes community stated that the container runtime user at that time, that I won't name here, otherwise Dan is gonna kill me, wasn't that perfect for production workloads. So, they introduced an API, a contract between the Kubelet and a container runtime, which is called the CRI, or the container runtime interface. Suffice to know that the CRI interface is just a client-server architecture. So, in Cryo, we implement the server-side, and Kubernetes, specifically the Kubelet, is implementing the client-side. So, it's just like an HTTP web server, well, like a server, when you make requests and you got to reply back. So, Cryo itself, well, what? Okay, Cryo itself has two main pieces, the runtime service, and which is in charge of the containers and pod-lip-side goals, and another one is the image service, and I'm showing you this because since the client-server architecture, the way we develop and make sure that this is not breaking up for Kubernetes, it's quite simple. This is the CRI in action, it's basically the Kubelet on the left, and the CRI, Cryo, and other container run times, adhering to the CRI on the right. So, the first line was the promise we made to Kubernetes back three years ago. We say, we're going to provide you with the perfect way to run production workloads in Kubernetes, because again, at the time, there wasn't really stable, every new version of that container run time was breaking, or well, it wasn't really performing that well. So, our promise to Kube was, we will never break you, but we'll try to do that up until now, it hasn't happened, so that's pretty cool. And we do all of this also, because we want to become, one day, the container run time of choice of Kube itself, so that everyone can use Cryo. So, when we develop features for Cryo, there are two main way we got these features. One is through the CRI, and as an example, Kubernetes at some point said, all right, we need a way to run a container with a given GID, or well, it was called run as group. So, the feature flow for that was, the CRI tell us we need this new field, we implement that, they provide some testing, we do our testing as well, and so that's how we develop features coming from the CRI. Another way we get features, or well, new functionalities is like I call R and D, but it's basically, we can use Cryo as a playground for stuff, let's say, and for instance, one of the example was Cudder containers or Kube containers, so they came to us and told us, like, and ask us actually, we want to run virtual machines through Cryo, so we went ahead and actually leveraged the Kubernetes annotation to actually have a way to spin up virtual machines as opposed to those row containers for more security and stuff like that. So these two are the main ways we are actually developing something new in Cryo and for every feature, we ship a test with extensive CI, which I'm going to talk about in a moment. So all of these boils down to, we need stability, we need stability for, and that's not stability, we need stability for Kubernetes. We need, again, I want to actually do as much as I can, we will never break cube and every new release of Cryo is taking care of any regression, of every bug fixes, of every performance regression, most important as well, so we will never do anything that could mine stability in Kubernetes via Cryo. And for instance, another example is that Cryo itself isn't implementing anything that isn't in the CRI itself. We don't do builds, for instance. The CRI interface has anything to do with builds, so we don't do builds. By having something in Cryo that can do builds, that could be for us that we could introduce bugs, we could introduce performance regression and whatnot. So how do we make sure that we never break anything or we pretend not to break anything? We do have extensive testing and we do run a lot of tests. I believe we're running more than 1,500 tests at this point. Those are divided into many tests. I'm going one by one in the next slide, but basically we do have suites from upstream cube, we have our own integration test, we test against OpenShift as well, because it's painless, Hillary, you know. CRI test is the validation for any CRI run time, so we make sure we validate against that as well. Performance test, those are in front of every request. Again, I'm going to show you later. And all of this boils down, again, to being confident about the tools we're developing and we're shipping to customers and to users and to communities. But more importantly, I guess, it's being confident about changes in our tool. I believe many of you are engineers. So, you know that even a single line change could lead to something like super wrong. Like, you can just do everything, blow up everything. So, all of this testing that we make AV use of is about being confident about the changes we make in our tools, specifically Cryo for this talk. So, another thing we make sure when we develop Cryo and Cryo features and fixing bugs and other stuff. So, every Cryo version maps to a given Kubernetes version. So, we have Cryo 111, which maps to Kubernetes 111. So, our test matrix is actually made of many, you know, rows and columns. As many Cryo and Kubernetes version we supported at the time. This is just a complicated thing, Sparter. So, back in the day, we started with integration and unit test. Since Cryo is just a server, we wrote a simple client which connects to the CRI socket and we make sure that for a given request we get the right response. Like, start a container, start a pod, all right, start to everything is fine. So, this is where the first test, our testing general that we introduced that three years ago, we were just two engineers. So, we made sure that this place was covered but it was far, far away from calling ourselves stable or, so, at some point we decided to, since we were built for CUBE, of course we started running what's called the Nod Antoine suites of CUBE, test suites for CUBE. And so we were, there were probably 300 tests or 500 tests, something like that. And we decided to run all of these Nod Antoine tests for every code changes that we got through GitHub. But we were running this manually because the integration test and the unit test were run with Travis on GitHub and it was fairly easy. Everything was containerized and it was just working. But for this one, we needed to, I mean, Travis wasn't enough to actually run a spin-off a Kubernetes cluster and run all of this. So, we were running this manually, sorry for that. And I'm pretty sure someday in the past, like probably the other engineer came to me and told me, did you run all the Nod Antoine tests on that particular graph? And I probably say yes, maybe I didn't, or any other way around. So, that was just, that wasn't just scalable. And then we do a CRI validation, the CRI own validation, which is the CRI test binary, which is just a binary that connects to cryo and validates. Like the integration test, that everything is going smoothly and the way it should go. And then again, at that point we had our own CI and we started like wiring everything together. And we added support for the upstream Antoine test suites, which is well more over 1,000 tests. So, we were running all of these together. And then Kali containers and Clear Containers came as well, and they offered their CI so that we make sure that we're not breaking their workloads as well with the virtual machine. And all of this is run on every pull request. So, for every pull request, we do have probably to sum up like 2,000 tests running in making sure that even a one line change in a go, in our go code could result in any breakage anywhere. Again, we do run old Sob and Shift Antoine test, but we do run this not in the cryo CI at the moment, but for every origin pull request, because this is even more heavy. And we do also run performance runs. We're not running this for every pull request because it's really heavy and usually disruptive. So, we make sure we run at least one performance test just before a release, well, not the day before, sometime before a release. So, we're able to catch any regression. Again, with other container on time in the past, this wasn't really the case. Usually, the Kubernetes community was actually seeing a performance regression well over the release date. And with this performance runs, we can get so many fancy graphs that we then need to go and actually understand them. All of this is red, this is blue now, you may know why. And because this one, this one is wrong, because of this, we're also able to catch like stupid regression like this. We have the memory leak and you can see like the memory going like crazy over time for so many runs. We were able to also ship like fixes just for that. So, Cryo was working and it was working just fine if you run all the 2000 tests, but it wasn't working like in production. At least if you have a huge cluster, you're going to have this memory leak. You were going to have this memory leak if we had them performance running, test run. We first run all our end-to-end tests seriously, sequentially at the beginning, it wasn't really nice because it took four hours to run everything. We're running in less than two hours now because we were paralyzed at that to spin up more nodes. And so all of this was down to continuous integration, testing, whatever you want to call it. It said just something for us, it makes sure that we don't break our users, our community, our customers, and whatever. This is just a list of our past like 12 checks between the CLA, sign off, and all the other tests we have. Again, the CLA test probably, this one I've been talking about is about Kubernetes features like security and stuff like that. We were able to catch bugs we add in Cryo when we first introduced that because we had some options that weren't actually supported by Cryo. So, and every time we make sure that this doesn't break before merging anything. So what do we spot when we run all of this testing? We usually spot bugs of course and regression, those are typical, but I guess more importantly, we spot flakes and this is actually cool because then you have to go and actually investigate what's going on. So flakes for those who don't know are just tests that can fail like once in like 100, 1,000 runs. So usually those flakes are bugs. Usually they're not, maybe it's just something in the universe that it's like flipping bits on the run, but those are bugs usually and there are bugs in tests as well. So again, I'm really proud of all of this because by running all of this and making sure this cycle works every time, we're also able to contribute back to the community we're part of, namely OpenShift and Kubernetes. So we were able to contribute patches to fix tests or to fix probably most of the time there were like risk conditions because Kubernetes has been tolerated before to run with Docker, I name it. And with Cryo and other new container run times like the one started with the C, we are faster and I mean that's a fact and so we spot so many races condition that we need to fix and we spot those by running this test every time, every day for every product class where we change. And like a bonus point, we want to be the default as I wrote in our first slides and we're also making sure that we have something like this. This is the Kubernetes test with, for every test there's gonna be a green or red box and with all of this we are able to track what's happening with our code I guess and so we're able to spot if something is going wrong, when it's going wrong, which one was the company to introduce it, a bug or whatever. So for us, the takeaways for us, it's like the first one is like obvious and I guess like everyone realizes that it's critical for every project to have some kind of testing and like more importantly, automation is probably more importantly because we are really lazy. As I told you before, like my other colleague could have asked me in the past like did you run all the tests on your machine and I probably told him like yeah I did but I was probably sleeping or I don't know I was outside drinking beer something like that. So we're lazy and so we don't wanna count on us but we better have a machine, a program to just make sure that we don't break anything. We're still not the default in Kubernetes, we're still working on that, that's depressing but we're working on that. Our next step to make Frio even more stable and even more welcome to the whole community especially around Kubernetes, they make ADUs of Ubuntu-like systems, they've done stuff like that so one of the next step we're going to take and we're discussing this forever at this point is to add some CI for Ubuntu as well. We have Travis but we cannot run all our CI in Travis because it wouldn't just work. We want to add more tests and more tests for every feature, for every bug fix, for every regression. We ask, we actually mandate to add a new test for everyone except then if you don't want it. And we do also want to do something like I guess OpenShift is already doing, it's like producing artifacts from the CI itself so that other people, other projects can wrap those and run their tests on those artifacts like namely RPMs, Dabs, and all of this must be automated of course. We're not talking about running all of any of this manually at this point. If you want to get involved whether if you have any ideas on how we could enhance our CI or our flows and stuff like that with a blog post covering Cryo in general, our GitHub page, we are on FreeNodes, Slack, and we've got all sorts of websites, with logos and stuff like that. And thank you. That's it. We have time for you guys have any question? That's usually, well that depends on Travis as well. So it doesn't really trigger, well from time to time, Travis triggers like instantly. From time to time you have to wait for the virtual machine for Travis to give you boon to whatever container. It's to come up and stuff like that so you can take around one hour or something like that or even less, maybe 40 minutes, 40 minutes to one hour I'd say.