 Good evening, everyone, and welcome to our talk on efficient Kubernetes end-to-end testing, Unleash the Power of Proud, who we are. I'm Jason Braganza. I've been an independent IT consultant for the past 20 years. I also served as the announcement shadow on the 1.25 release. I currently serve as the new membership coordinator for the project. My name is Priyanka Sagu. I work at SUSA as a Kubernetes Integration Engineer. I also do a few things in the upstream Kubernetes project. Currently, I'm a technical lead for special interest group contributor experience, a get a badminton for the Kubernetes end sub project orgs. I'm also currently the release lead for Kubernetes 129, and we are ready to release it on December 13. This is our release logo. We have some stickers for our release logo if you want. Please meet us after the talk. Thank you. There are lots of stickers. Catch Priyanka after the talk. What do we have on our agenda today? We look through at the architecture of Qtis2, see how the upstream Kubernetes project integrates with Qtis2 and Proud. We understand how all of this fits together with a couple of in-depth demos of the upstream project testing in action. So Kubernetes end-to-end testing. We have all our source, the Kubernetes source, in GitHub Repos. And folks obviously want to improve it, to add features, to find bugs. And so they submit their own code to the project and form a full request. And we need to check for brokenness. We need to check if the code's all right. So that's what we do in an automated fashion. So the code's taken the PR code along with R code is taken together at a snapshot in time. We create binaries based on that code. And then we stand up a cluster using the new code. This testing that happens, the new cluster then runs a series of end-to-end tests. And the tests, log, a lot of stuff, that's what we get at the end result as artifacts. The clusters don't down. And what we have in the end is just the logs containing the test results. So all of this process that I just told you is basically driven by Qtis2. What is Qtis2? It's a framework for deploying Kubernetes clusters and executing end-to-end tests on them. It helps with cluster configuration of various kinds. It helps with end-to-end testing and log collection. And then the disposal of the test environments. The workflow that I just described basically comes like this. There is a build step. There's a cluster configuration step, like I just showed you, where it builds up the cluster first and then brings it up. It then Qtis2 then runs the tests that we specify and does a lot of log collection. That's the test step. And then disposes of the test environment by bringing the cluster down. Cluster configuration, end-to-end testing, test environment disposal, that's the whole flow. The architecture, it's split into three independent executables, like three legs of a Stool or a tripod. The main binary is Qtis2. That's the one we use to interact with and to drive the whole thing. But on its own, it does nothing if you see one Qtis2 when it's just like, it says detected deploy is nothing, detected test is nothing. So those are the missing legs of the Stool. The second one is basically the deployer. Qtis2 is smart enough to notice if there's a file in the path that says Qtis2-deployer1. It recognizes it as a deployer of sorts and picks it up, like over here. So since we had that, it's now running Qtis2 says there's a deployer over there in the highlighted portion. Similarly, we have testers of various kinds. So those two have the similar naming convention with Qtis2-testers slash whatever test the name. And then once all these three legs out there, you can see Qtis2 coming and detecting the deployer and the tester. That's the architecture and sum. Here's the example. You can have multiple deployers and you can have multiple testers and you can mix and match. The syntax of the command then basically becomes you run Qtis2 with the deployer you want, with the tests you want, and with instructions to bring the cluster up and down. Qtis2-deployer name has split this thing into new lines for readability. So there's up, down, tester, test, followed by the tester name and the arguments you want to pass to the tester, like this example. It's basically the upstream CNCF Kubernetes test against a GKE cluster. So Qtis2-tester-deployer name GKE, you bring the thing up and down and we are running the Ginkgo tester and then telling it to run conformance tests the same. So how does the upstream project integrate Qtis2-tester-deployer? Take a little glimpse of the workflow. So we have users who, if lots of you are contributors, you've contributed code to the Kubernetes repos and you've seen that we can do command sensor and we have also something like automated commands, right? You can do a slash close and you can do a slash test. There's a, here, for example, we're doing a test all to run the code, to run a test against a code disorder. So all that is handled by a CI-CD system called Prow, which mimics the nature of Kubernetes as a whole. Like we have a control plane that makes all the decisions and the scheduling and stuff and then farms the workout to work on nodes. Similarly with Prow, there's a service cluster which listens for events and does the scheduling and distributes stuff and there are build clusters which then the test and the work then get carried out on. So over here, the Prow is listening, it takes the event, sends it to the Kubernetes cluster, the Prow service cluster and then it creates a Prow job that's scheduled on a Prow build cluster. Let's take a look at what's happening in the build cluster. The jobs come in, the pods created. For our tests, it pulls a specific image called kins.e2e and then that is the thing that pulls kubetest down and runs it with the arguments we specify. So the primary features of kubetest, having looked at all this is basically we got a consistent cluster lifecycle. We know how to consistently pull any kind of cluster up and down. There's decoupled implementation of deployers. You can have multiple deployers. You can have multiple testers. You can mix and match them. There's reproducible CI and local testing experience where the stuff that you're running on your CI CD system, you can also get kubetest to down on your system, get your tester down, get your deployer down and then simulate the experience on your local machine too, kind of the same. And it has support for Boscos which is a Prow plugin which helps with lifecycle management. So you don't really have to deal with bringing up another cluster itself. It will ask for a lease for an AWS service or a GKE service and Boscos will provide it to them. We have bespoke deployers. At present, kubetest only supports GCP, GKE and kind deployers in tree, which then how does something else work? Like I just spoke about different kinds of deployers. So how does that work? Those are external. It kubetest enables running custom deployers for different cloud platforms out of tree. That's how we have an AWS on Microsoft deployer working. You can write your own bespoke deployers. You can learn, we don't have time for that right now. So if you want to learn a lot, that my colleague Bianca has a whole talk on it over here. So you can scan this, you can always get the slides later and check more out. So haven't spoken about Rao and looked at its architecture. Let's see how it works in action. Let's dive into some demos for that. I'll hand it over to my colleague Priyanka. Thank you, Jason. So here we just learned about what is kubetest and how it comes in the picture with Kubernetes project. Let's see a few jobs from Kubernetes project. For example, I'm a release lead and what matters for me is I have a green signal when I have to do the release and we have jobs that gives us that signal whether our master Kubernetes branch is healthy, whatever changes that are sitting inside that branch, are they okay to ship in a certain release? So before we do that, I want to introduce you with a prerequisite topic, something called Kubernetes version markers. So every single day, we are receiving this hundreds of PRs from so many contributors around the world and every time we receive a new change, we have a few jobs, a few scripts you can understand that keeps checking out all the code from Kubernetes slash Kubernetes repository with all these changes and build artifacts out of them. And when I say artifacts means Kubernetes binaries because we want to test whatever changes, whatever new changes that are now added to our Kubernetes slash Kubernetes master branch, are they actually building binaries that are not breaking any other changes? So that's where Kubernetes version markers helps us. Kubernetes version markers are text files which acts as sort of public API for accessing Kubernetes builds. So what does this means? For example, I created a PR to add some changes to Kubernetes slash Kubernetes repo. Now a job or a script ran and it created a fast release of Kubernetes which gave me Kubernetes binaries. And now I'll store those binaries somewhere so that whenever I want to run some test on a cluster that's built out of this new binaries, I can just get those binaries from some place where I've stored them. And in Kubernetes case, we store them in GCS buckets. So any artifact of any successful Kubernetes build, Kubernetes fast release, which we store in Google cloud storage bucket, we call them Kubernetes version markers. Here is an example. For example, this is a screenshot or a snapshot of a GCS bucket we use for Kubernetes project, KITS release dev. And if you see, there is a path I am marking here, KITS release dev-ci. Inside that, there is a file KITS stable one dot text. That's the file we are talking about as a Kubernetes version marker. What that file contain is this big version. This is an old example. We are looking at a version marker for 126 release. So how to read this version marker and what do we exactly do with this version marker? For what we are getting, what we are seeing here is a file which just contains some version. But how do I get the build artifacts, the actual Kubernetes binaries, which I can use. So whatever version which I got from the KITS stable one dot txt file, there will also be a folder inside this path KITS release dev slash ci. And that's exactly that version we got from that stable one dot txt file. And inside that, what I see here is all the artifacts that I built from a certain state of a snapshot of Kubernetes slash Kubernetes master. Which one? We can understand then from the version marker itself. So the first part of the version marker 126.3, that's the base release tag. Whenever there was a release for Kubernetes 126.3 patch release, that's the first part of that version marker. On the right hand side, that big commit like number, that's actually the latest commit on our release branch. We are talking about 126 here. So the number, the big pattern on the right hand side is the latest commit as of when this screenshot was taken. And the middle number is the number of commits from the current latest commit on release 126 branch from the time 126.3 was cut. So there are 41 commits between this time duration. And that's what this version marker is telling. Okay, this version, this particular directory in our GCS bucket contains build artifacts from this particular snapshot of Kubernetes slash Kubernetes repo on release 126 branch. How do we do that? There is a job we run. So Jason introduced a bit about Rao. Rao is a CI CD tool built by Kubernetes and it's built mostly for Kubernetes project. It's now adopted by other projects as well, which is great. So you can understand it as any other CI tool. For example, GitHub Actions or GitLab. We have a tool, we have a talk from KubeCon EU 2023 that covers entirely about how to read these jobs. I'll try to explain a bit here as well. The first highlighted part here, line number one, let's say it's periodic. So here we are trying to define a job, kind of telling it it's a periodic job. It needs to run at a certain period. In line number second and fourth, I am giving the name, a name to this particular job. I'm calling it CI Kubernetes build 126. And in line number four, I am saying, please run this job at every one hour interval. Line number three, I missed, cluster cared as intra-proud build. Jason also introduced Proud, follows a model of Kubernetes itself, which is control plane and worker plane. Or in case of Proud service clusters and build cluster. So here we are telling, and from Kubernetes, we know we could have multiple worker nodes. Similarly, we could have multiple build clusters in Proud. We are mentioning basically which build cluster to use for this particular job. We are giving it the name for that. Line number five helps us to do that entire thing. So what we are doing here is we are giving some annotation here, decorate set to true. But this annotation does is it tells Proud, now you need to check out something. You need, we are calling it, we need to build Kubernetes from the Kubernetes slash Kubernetes master or a certain release branch. We need to build binaries. How to do that? All that functionality is initiated by line number five. But what needs to really check out, we are giving that in line number six to nine. We are telling it, please look for the GitHub or Kubernetes, repo Kubernetes, and release branch release 1.26. Please check out the code from this particular Kubernetes repo and then do something about it. These are some other other set of annotations for whenever we also plug in whatever we are going to see. For example, whenever this job is going to run, we need to post the log somewhere. We need to see whether the job ran successfully or not. And we show some of those statistics on a tool called test grid and we are calling it which test grid dashboard and test grid tab name. We need to, that's relevant for that, this particular talk, so I'll pass. Labels, we are also passing more information because this job needs to run inside a Kubernetes cluster. It needs certain credentials. And when I set stuff like preset dined enabled or preset service account, these labels do provide some more data to our proud node. I'll show that in the example right after this. Finally, we have the spec. It is like any Kubernetes port spec. We are giving an image online number 21. This is the image you have to use to create a port container inside a Kubernetes cluster. We are giving it Kubernetes CI builder. That's the image that is what we use to build Kubernetes binaries. And everything else is similar to any other Kubernetes port spec. We are setting up resources here. Line number 331, we are giving it privileged access. And finally, we are doing something inside that container. What we are doing here is running a CRL command. CRL or KRL, it's a tool created again by Kubernetes project itself for our release process. It stands for Kubernetes release. What we are doing, that is exactly the tool that we use to do any releases. So even our minor releases, patch releases, any other RC, beta, alpha cut, all those are cut by using this tool called KRL. We also do fast releases. And that's what we are doing here. I'm just telling it, KRL, CI build, build the Kubernetes from what we kind of like the code we checked out from Kubernetes, Flash Kubernetes release 126 branch, build the Kubernetes binaries out of that and dump them in a bucket, which is we are telling it online number 38. Dump all those binaries in Kubernetes hyphen release div in a registry. All the images that you are creating, apart from the binaries, put them in that registry and add a version marker. So whatever version is created out of that particular build, fast build, put that version in a file called KLS hyphen stable one. How all this process, when that job actually runs and inside a Kubernetes cluster, how does that look like? Let's see. So whenever ROW gets a trigger for that, a port is created. We saw the port is created using a container image called KLS CI builder. We also gave it some information from where it has to check out the code. So it is checking out the Kubernetes, Flash Kubernetes, which is it's cloning the Kubernetes, Flash Kubernetes repo at release 1.26 branch. I also talked about certain labels, like which were prefixed by preset. So what does label do? Preset service account went set to true. It gives us this information. What it's doing is it's setting up to environment variable Google application credentials, E2E Google application credentials, some information that's giving on wherever the cluster is going to be, sorry, it's giving us some information to interact with the Google GCS bucket where we are going to store our built artifacts. The actual JSON file, service account.json file is also added to our port as a volume. We also had another preset dined enable that again sets another environment variable Docker and Docker enable set to true, again creates a empty directory Docker graph to make sure we can run Docker inside this Docker image container. And finally, once our port environment is set up, what we do is we run the scroll command, which again we discussed, it's going to do a fast release, it's going to create, it's going to create Kubernetes binaries out of this particular clone and dump them into a bucket KITS release tab and provide the version marker in KITS-stable1.txt. I also talked about those annotations we had where we were setting test grid name or something that was it, but this is how that UI looks. So if I usually in Kubernetes cluster where all these jobs are not everyone in the community, actually hardly anyone in the community have access, have admin access to those clusters to actually go and execute inside the port and look at the logs. So that's not ideal. We need to consume the logs to check if something failed at some point, how to fix that. So we use this UI, this user interfaces where we dump all this logs that are coming out from this port. So this is one of the user interface we have called browsepyclass. Here I can see whatever happened from the beginning of that port creation. We ran that Krell command. It also, we had those dine enable preset label set up. So docker in docker is also enabled. Finally, we had also provided its service account label so it can see the service account and once everything is set up, it's running the Krell command. That's where we can find that job. This is a screenshot from test grid. So we now know what are Kubernetes version markers, how to use them, how to use the artifacts that we built and stored in GCS buckets to do something useful with them. That's where our second examples come in. We deep dive into end to end this. That's, so here is another example from the Kubernetes project itself. It's a release blocking job, which means this is going to be one of the jobs that's going to give a CI signal whether to go ahead with any release or not. Any as in which releases also. We'll see from this particular example. Again, going from the first line, it's a periodic job. We are telling it, it's going, it should run at every three hours interval from line number four. Second, third is same. We are setting the name of the job, giving it which build cluster it needs to be scheduled on, setting up few labels for adding all those credentials and volumes. Dashboard, test grid dashboard, test grid tab name. Description tells us where, again, we can't exact inside our port in these Kubernetes clusters. So we need to show these logs at some place. We use test grid for that and we are providing where exactly. So this particular job data will come on test grid dashboard, conformance volume, conformance GCE, which particular tab would be conformance GCE master and we are giving a one line description here. We have two other annotations here. Fork per release, fork per release replacements. So Kubernetes does not maintain just one minor, or yeah, minor release. We maintain multiple minor releases and multiple patch releases for multiple minor releases. So this line number 13 and 14, we are giving that information that for every release do a fork and replace those placeholders. Again, we have the spec section for what's going to be inside the container of this job. We are going to build this container this time with a new image called kubikin's e2e. It's going to be relevant because this image ships with something that Jason introduced in first part of this talk, which actually is kubetest. I just saw, I've written it there. So this image comes with kubetest too. That's the tool that we use to build binaries or bring up a cluster, then run tests and bring down the cluster while collecting all the logs. So this is the image that ships with those tools, along with other tools like gcloud and kubectl, gcloud to talk to our GCS or GCP buckets. We can also have these jobs running on AWS, those other tools also ship as well. kubectl to interact with the new cluster whenever it comes up. And now these are the arguments. So here we are saying, we are now going into the details of what kind of test is running here. Here we are saying online number 21, use the test scenario Kubernetes end-to-end test. So there is a link at the bottom of this page, git.keretest.io-test-infra. That's the repo where all our test details that run inside this proud jobs are if not at least some major bulk of those details that actually points to other places where our tests are stored in Kubernetes project. So we have those scenarios stored at that path and out of all those scenarios, we are telling it just use the Kubernetes end-to-end test and then whenever you use the scenario use these other different set of arguments as well to tell what, where you actually need to run those end-to-end test scenario. So let's see again this in action. Here we have a port we created, the container using kubectl end-to-end image. We set the preset service account true that helped us to check out Kubernetes slash test infra at master as well as added all those environment variables, service account volume. Similarly, we gave it get SSH. So in this particular test, we need SSH keys. We are adding with this another preset and that's it. This is where our command is coming in the place. We asked it to run this particular test scenario called test-infra slash jing-in slash scenario Kubernetes end-to-end test. What this particular Python file does is it actually creates a kubetest command. All this file does is whatever all the arguments that we provided it in front of it, use them and creates a kubetest command for us. So here we are telling it use kubetest2 and use all this information that I'm giving. For example, the GCP service account to interact with the GCP cluster. Use the provider GCE. So here we are saying you need to use a GCE deployer. That means you have to bring up a cluster in Google Cloud Platform. We are giving it other settings like GCP network. Do we use this configuration for master images, use those images or zones, et cetera. All those information we are providing. And the service account we provided that actually is used to authenticate with G Cloud. We saw that there is not a project, GCP project flag anywhere here when we saw. So we need to actually tell it which project, which GCP project you have to interact with. That information comes from that boss course piece with Jason introduced earlier on. That's a lease management tool. So whenever we are running these jobs inside Proud, there is a lease management tool in the picture that dynamically provides access to these cloud resources. So in this case, if GCP project is missing, Cube Test 2 will try to get a GCP project from boss course. Once all that is done, this is where our Kubernetes version markers are coming in the picture we are telling. Please don't build everything from scratch. Don't go and build whatever new artifacts you have to build rather use what's coming from this version marker. So we are telling it use this CI bucket KITS released there, which we saw and use this version marker CI slash latest fast. So a note, we just don't have that stable 1.txt file alone. We have more like those which contains different version markers and in turn different binaries attached to them. So here we are giving it another one latest fast. Go and look in KITS release the GCS bucket, latest fast tag and then downloaded from there. And then basically use those binaries to now bring up a cluster, run some tests, which where we are calling it use the Ginkgo tester, run conformance test and then bring it down. So what's really happening here is this is like a snapshot of Kubernetes cluster. For example, here we have a cluster KITS infra end-to-end VOSCOS, the name of the cluster we provided in our BrowJob context, our Cube CDL config context is now set to that cluster. We ran Ginkgo end-to-end test because we've asked it to use Ginkgo provider and we are running using Ginkgo to run some conformance test here. Whatever logs, whatever test results are generated out of that particular test run, we are taking them out, dumping them on a certain path in our container and then bringing down the cluster because all we needed was to run some tests and check whether those tests are good or not to make sure these changes are working fine with our existing state of Kubernetes cluster. So that's what's happening and in the end, if I as a known admin, what is this? What is this? If I as a known admin have to check those logs, I will look for them in a place like this test. The link is down there. It's pointing to test-crit.KITS.IO slash conformance-gc, conformance-gc-master and that is the information we provided in annotations somewhere. So that's the end of our example two demo. Kubernetes project now recommends using Cube Test 2 but it also had a successor before called Cube Test. That is also in use in some part of the Kubernetes project but we are moving towards Cube Test 2 right now. So that's just a nod there. If you want to start using Cube Test 2 for any of Cube Test or Cube Test 2, Cube Test 2 would be the recommendation. We can't do all this testing with Cube Test or all this proud job testing locally. Like we do not want to just run the test in on Kubernetes slash Kubernetes right away. Before we do that, we want to test those changes maybe locally as well or somewhere at least. So there are some ways to test these proud jobs. One of the easiest ways, pull that Docker image that we provide in our port spec of proud job, pull that down on your local machine and then run all the commands that we provide in the command section. Or there is a tool called Feino. That could do the same for you. You can ask Feino giving it a URL of the job from test grid. Like use that job but now run it locally and it will do it for you. It will look at the port spec. It will fetch that image down for you and keep prompting you to provide information like GCP project or GCP service account. So it will do the manual handling but that's not always possible because not every one of us has access to these cloud service account. The best thing at least, not the best but what we usually do in Kubernetes project is we actually just PR these proud jobs on this repo called Kubernetes slash test and which is the home for all our other proud jobs for Kubernetes project. You PR it there and it can then run inside the PR and you can check whether this job is running or not. And based on that, merge it or make changes. The conclusion here is Qtest 2 is a tool that can allow to quantify your entire Kubernetes cluster lifecycle from building up the cluster from actually building the binaries then using those binaries with all these changes that are coming from different places. Yeah, to bring up a cluster and run some tests on them, bring the cluster down because we do not want to keep our course running upwards. Use those logs that are generated from the test durations to take a call. That could be anything. We use those test statistics to make many, many sort of decisions for example to create a CI signal whether we should go ahead with a release or not or maybe whether to merge a PR or not that was our first example. And with that, we are at the end of our talk. If you want to try Qtest 2, that's n6.keraters.io slash Qtest 2. We have a few entry cloud providers there, kind which would be the easiest one if you want to just get started. GCE, GKE also comes with that and there are a few out of three providers for Azure and I believe AWS. So if you want to try using those, that's also an option. And if you want to build a Qtest 2 deployer of your own, there was another talk in the middle of the slides. If you want to talk to people who have built Qtest 2, the place would be testing SIG-keraters.io on Kubernetes Slack, that is slack.keraters.io. The speakers are available at PSAGO and JSON Braganza as well as on the same Slack, slack.keraters.io and those are our emails. Any questions? I mean, we're at time, actually over time and we're on break, no time for questions now. Coffee break and we'll be back in 20 minutes. Thank you.