 Hi there everybody, welcome to my talk. I'll be giving a talk about sick testing in general updates. All right, my name is Mohammed. I'm a senior DevOps engineer, a thousand eyes by Cisco. I am a Kubernetes maintainer. I am the sick kids infrared tech lead. I also work with sick testing and sick release on some projects. I'm also a candidate maintainer, another CNCF incubating project. I am the productivity working group lead there. I work similar concerns there as well. So a couple of topics that I wanna discuss today. So I kinda wanna talk a bit more about sick testing, what we do, some of the tools and systems that we have and some projects that we've been working on lately. All right, so what is sick testing? So we're interested in the effective testing of Kubernetes and automating away a lot of toil that's related to testing very large code bases. So we've got some frameworks, tools and infrastructure. It makes it very easy to write and run tests, ensure that Kubernetes is extremely stable and we develop and test that scale. First, it's particularly important that we can track and detect flakes, make sure they don't end up in releases. So leadership, so we have three chairs and we have five tech leads. So everybody's working on various different parts of the sick testing ecosystem. All right, so moving on to tools and systems. So we've got a couple of tools that we've built. So the first one is kind. Kind is Kubernetes in Docker. It's a very powerful tool. I'll talk about that in a second. Another thing that we've got is keep test, is a test framework that we've built to launch test clusters for Kubernetes. And the other thing we have is something called the EDV test framework. So it's a Go program that we wrote that allows us to launch all sort of EDV tests. And we have a couple of production systems that we use. So the first one is proud. That is the Kubernetes CI. You can go there and take a look at all the jobs that are running, the health and status. And the other thing we have is test grid, which is a grid of test results, effectively. All right, so this is proud. So that's the address there, proud.co.io. These are jobs that run on there. So it kind of looks something like that. There's plenty of jobs. And if you look at the timestamp on the left, on the right, you can see that we run quite a lot of jobs. There's like 10 jobs that are scheduled within two minutes. So this is what it looks like when you open a test. This particular test runs what we call the Kubernetes EDV-3 on Google Cloud. So we spin up some virtual machines on Google Cloud. We install the Kubernetes binaries and we launch our test tree. Mostly conformance and a number of features. Proud also has a lot of powerful things that it does for us. So chat ops is one thing. This allows people to merge PRs, request reviews, retest jobs when they fail. So over there you can see me applying a couple of labels to a PR test grid. So this is a product that Google built. It's mostly open source, but the UI is work in progress. So if you're a front-end engineer, I'm looking forward to hearing from you. But yeah, so this allows us to visualize all the tests that we run. So for any given test, if it writes JUnit results, we can see every single test and whether it's passed or it's failed and the history of that test over a period of time. You can see that on the next slide here. This is a conformance test, right? So this tests where the Kubernetes meets the specification that was set by CIG API machinery and the other six. So on the screen, you can see there were runs about four or five times a day and there's been a couple of green runs. There was a run in progress round to that screenshot. I think there was a failure earlier. So we can take that failure and we can see what was going on. For that particular one, there was a failure in cluster provisioning, so it bugged out. This test is usually green. Here's another thing. If you look how it's a summary, these are other jobs that are related to this test suite. The ones that are red are permanently failing and we probably should fix them. The ones in blue are either green or have odd flakes. So keep test, sorry, so keep test two. So this is a test framework that we've got. It's a program that allows us to launch a Kubernetes cluster. So on this screen here, you can see an invocation of a test, right? So what we're doing is we're gonna use cops to launch a cluster on Google Cloud with this particular configuration. The configuration on this screen has Kubernetes and CNI and this particular test run is testing alpha features, which is a bit of an interesting one. Right at the bottom, you can see that we are focusing on this particular test features and we're skipping a lot of them. Kind, so Kubernetes and Docker, this program is used a lot and anybody who develops Kubernetes has to use this to do e-d testing, so a lot of downstream projects. I work at Knative, so we use this quite a lot and it allows us to mock up a real Kubernetes cluster with a working functionality. So it's really great for testing Kubernetes itself or other projects that are quite Kubernetes APIs. It's really fast. It takes like some a minute to launch and you've got a working cluster, as you can see on the screen there. All right, so project updates. So since last KubeCon, we are working on a couple things. So the first thing is proud CI on AWS. So last year, as you might have heard, Amazon gave us some credits, so we put that to use. That SIG testing, we've created a build cluster to run some of our jobs. So we have a large number of jobs that don't require to run on a specific cloud or doesn't even need access to API. So all it does is, it runs a bunch of unit tests and it's very straightforward. So we've scheduled some of those jobs to run on AWS. But it's been a bumpy ride. We've had some issues with EKS, as you might know, for those of you who take EKS to production. We've had a couple of nodes dying and kind was bugging out a little bit. Many thanks to Ricky in the audience for helping out and Marco and Patrick as well. It's done a great job. The other initiative that we're working with AWS on is we need to test Kubernetes, Kubernetes with AWS nodes. It's not something that the project has done for a long time. Kubernetes was initially open source by Google, so we've been running our tests on Google Cloud, which is great. But we also have the requirement or the need to be able to run these tests anywhere. So right now we're targeting AWS, so if another cloud vendor turns up with more credits, we'll probably be able to run it there. Another fun thing that we're working on right now is what we call scale testing. So we have a special interest group called scalability, and their job is to test, how does Kubernetes run when you're using 100 nodes or 5,000 nodes or 15,000 nodes? That's the largest nodes that you can run on GKE, so we need to test to make sure that works. So we're trying to get that to work in AWS and it's work in progress. So this here is our core Kubernetes test suite running on AWS and it's great. It took us quite a while to get there. Here's another one. So sorry, the first one was node tests. So we test how KubeLit behaves on various operating systems. We also test the KubeLit API itself, so that's what the second one was. This is a project that I'm working on. There's the cap number if you're interested, but the idea is since v1, back in 2015, some Googlers wrote this very complicated shell script to launch clusters on Google Cloud. Now these scripts are very brittle and fragile and it's very difficult to add features to it, right? So also more importantly, if you're a developer that wants to do a real EDV test, you're gonna struggle to run a script on your computer. Another fun thing that we're trying to do is test on different chips and architectures. So ARM is one that we're looking forward to doing and it's not quite possible today with the current cluster provisioning tool that we've got. So this project is kicking off. It's gonna be alpha for this release. That's coming out later this year. And hopefully if things go to plan, I really wanna get rid of this thing by the end of next year. So hopefully when I'm here next year, I can tell you that it's gone. So for those of you who are interested, this is what it looks like, right? So we're calling a Python program. That's a wrapper for a Go program with all these arguments to go and run a test. It's kinda complicated. By comparison, this is a pure Go program that we're gonna keep testing to, right? So you can run down your computer with the same set of flags that are on my screen and you'll get exactly the same result that you can see. This one, you could probably get it to run but you probably need Python and make and a couple of things installed in the computer and it'll be tricky. So as for the migration, it's work in progress. I need to fix a couple of things. There are parts of the clusters that we've provisioned that are missing a few things. We also got some badly written tests that we're trying to fix. All right, so the e-test framework. So last KubeCon in Amsterdam, one of the tech leads, Patrick, gave a great presentation about the e-test framework. How it works, how to write good tests. I recommend that you watch that. But since then, he's introduced a new feature which allows us to use labels, right? So in the past, if you wanna run an e-test, you have to specify two flags, focus and skip. So you could tell the framework what tests you wanna run. And it's a little bit complicated, right? You couldn't make some good selection. So there's a new style. It allows us better control over the test suite that we run. Do we have any questions? All right, yeah. I think you need to walk over there and talk into the mic, yeah. Can you go back one slide? Can you talk through some of those, what some of those label filters are doing? Okay, yeah, I can do that. So these are labels for alpha features. So by default, we don't wanna introduce an alpha feature and run it all the time. Because you gotta modify your queue that is cluster to enable these alpha features, right? So you gotta put some feature gates on there. The default test suites that we run as part of our release informing and release blocking are not designed to run alpha features. It's not gonna work. We do have a dedicated alpha job, but the other jobs don't have that. So it's very important that the new feature that you're writing that's going off has a tag on there that says it's an alpha feature. It's got in the format node feature. If it's a node thing or a feature, if it's a Kubernetes API thing. Is there anywhere that test writers can go to get resources on time? We have some documentation about how to write an E2E test. You should be able to find it in the Kubernetes.dev website for Kubernetes contributors to get an idea of how to write E2E tests. Okay, and does that have like a decoder ring for some of these files? Yeah, it tells you how to write tests and what the criteria is to move tests to GA and et cetera, et cetera. Cool, thank you. Any other questions? Is there a reason you guys didn't choose to start with or I guess use open source frameworks like Robot Framework or there's a couple other ones out there to kinda, it looks like you're kinda rolling your own, setting up your test and creating the test logic and everything. Is that, am I reading it wrong or? Not quite. We do use Ginkgo, that's an open source framework. So that drives the E2E test framework, right? We can't just use Ginkgo directly, right? So we gotta write a shim, which is what we call the E2E test framework. So we've got a mini program that wraps around Ginkgo to do stuff. Yeah, any other questions? All right, going once, going twice. Okay, thank you so much everybody. One more thing that I forgot to mention, so I'm looking for more contributors. So if you're interested in this space, I'm looking forward to hearing from you. We do have a GitHub repository, it's called Kubernetes slash testing for us. There are a lot of issues there, so you can, please scan the QR code. We can also be found in the Kubernetes Slack, so sick testing is the place to go. So come along and bring some questions. We do have some well-defined good new issues to work on. Thank you so much everybody.