 Hi, and welcome to Contributing to Kubernetes Conformance Coverage. I'm Zach, and I'm Caleb, and we are from IEI and today we are going to be talking about what is conformance, why it's valuable, and the tools that we can use to help achieve it. So to start, a bit about IEI. We are a group of technical folk here in Aotearoa, New Zealand with a focus on cooperative coding. We believe that pairing is sharing, and we pair in everything we do, including these presentations, and you can find out more about us at IEI.coop. The people that make up IEI were a small but powerful bunch, consisting of Hippie Hacker, who is the founder and sort of vision of IEI, myself and Caleb, who you have met, as well as Steven, who is our resident test writer, Rian, who is our project manager and helps us keep us on track and focused, Berno, who does a lot of the automation and proud jobs that we, some of which we'll be seeing later on, and Brenda, who ensures that all of us can actually function. And now let's talk about Kubernetes Conformance. What is conformance? Yeah, Kubernetes Conformance is a program that enables every vendor to support the required APIs, as do the open source community versions. It's valuable to have consistency across the platform wherever it's run. And why is conformance important? Good question. It enables portability of workloads, freedom from vendor lock-in, and stable APIs. It also allows consistency across providers. Really, I expect my workloads to run the same regardless of whichever vendor it is or wherever it is. There's a good background and rationale for conformance at cncf.io slash ck. Here, you will also see some nicely detailed steps on how vendors can be certified as Cates Conformant. Right now, there are about 67 certified distributions. The full list you can see at landscape.cncf.io, and then click on the certified Cates link on the left. But this list is great. That means that you can have consistent, unsurprising, fully conformant behavior across a breadth of providers. These vendors are certified and added to the list that we just saw through a transparent and open process on the Cates Conformance repo on GitHub. We will cover this process in the presentation shortly. This process is enabled or is able to happen because in Kubernetes, the conformance is defined through the API and a test suite. That allows for tools to be built that fit in within existing Kubernetes workflows. Two great examples of that are Sonabui and API SNP. How do I certify my distribution? I use Sonabui. Sonabui is a command line utility that interfaces with your cluster. It deploys conformance suites that allow you to generate data off of it to submit in your PR. You can install that through the first command and then run a generic suite against your cluster with the second command. Sonabui will then run the full E to E test suite on your cluster. And as it's running, you can invoke Sonabui logs to see what is happening. What you'll see is the tests that are running at that time and their status and a nice big wall of text. You can also invoke Sonabui status, which will give you a good summarized result that will show you when Sonabui is complete and whether you passed. Here in this example, while we did pass on our complete, you'll see that the test count was one. So we skipped a whole bunch of E to E tests and our results. This will become important later if we try to submit this PR. At the end of the run, you'll end up with several files. These represent the complete results of the test suite. You must include these in your PR to get its conformance. Now that you have the results, let's go and fork the case conformance repo and then we'll add the files in the PR. And while the complete instructions for how to do this, you can find on that repo. Here is a good example that just shows that a PR was open for Talus from their own to the CNCF case conformance repo. And their PR is going to include the JUnit and E to E log files coming from that Sonabui job, along with a readme showing how to reproduce the results and a product.yaml giving more information about Talus. Once the PR is merged, you've just got yourself a certified Kubernetes distribution. And not only that, you have a sweet logo and everything it represents. So that was talking about verifying the conformance coverage for a vendor. Now we can talk about improving conformance coverage and a great tool for that is API Snoop. So the reason behind this for conformance to have value and to have meaning, then the API must be reliable and consistent and a way to ensure that consistency is through conformance tests. API Snoop is intended to help with all aspects of test writing and test coverage for Kubernetes, from identifying the gaps in coverage to closing those gaps with tests and preventing new gaps from happening. API Snoop is powered by a database that we call SnoopDB. This is simply a Postgres database that is filled with the API schema from Kubernetes, Swagger, JSON, as well as audit events from the CI jobs running the full E to E suites on clusters. You can also deploy API Snoop to your own cluster, at which point it will log the live audit events as they happen. This can become useful for test writing to get immediate results to ensure that the tests are hitting what you expect. So let's start by talking about identifying gaps. And for that, we can use API Snoop.CNCF.IO. On this website, the data in SnoopDB showing the current coverage is visualized as an explorable graph. So we can zoom in to the section of the API of Kubernetes that we are interested in. For example, we wanted to see stable and core. This API is then color coded. The gray area is showing the endpoints that are untested. The light area are showing those that are tested, but not conformance tested. And the solid colored area is showing those that are with conformance tests. And then there's a nice little summary here on the side. So right now there's almost 60% coverage on the stable core part of Kubernetes. As you zoom into the various section, you'll also get a shareable link so you can give this to others to help a conversation or use it as a quick way to return to the part of Kubernetes you're interested in. Along with that, we have a nice conformance progress page here that is showing the history of the conformance coverage. Here we can see the increase in the Kubernetes API and the total number of endpoints as well as an increase in coverage with a cool increased velocity happening, which is great to see. Now, going back to the presentation, after identifying, we want to close these gaps with tests. And we can do that with test driving. For here, let's go ahead and do a demo of how II writes tests and our flow for going through it. For to start with, when we are writing tests, we start at sharing.io. Here we can create clusters on the fly and deploy them with applications to get started. So they are configured to be built to make pairing really easy set up with teammate and so on. So we can share a session easily. And in this case, here is a cluster that we had set up beforehand that is already has API snoop and our ticket writing repo deployed to it. From here, we can then log into that one of the pods with just a simple SSH command. Now, let's go ahead and check in on where we've logged in. And here we'll do a pairing session. So Kayla will be the driver, I will be the navigator. And what we're seeing is our text editor, which is a highly customized version of Emacs and an org file within it that is showing the flow for test driving. Basically, what we do is we create a mock ticket beforehand or a mock test beforehand, I should say, where we identify something, identify an endpoint we want to test, write a version that could maybe hit it, verify that it does, and then submit that as a ticket. So we can get early feedback on how it's written. Based on that feedback, we can then convert it into an actual ETE test, which we would submit as a PR. Let's go ahead and see that in a bit more detail. First, we would want to identify an untested feature, and we can do that with API snoop here as the database that is connected to this document. And we're looking at all the untested stable endpoints, or at least 25 of them. Let's go ahead and filter that even further, and let's say we want to just look at ones that are part of the deployment kind. So here is one untested endpoint that could maybe be good to write something for. And maybe we want a bit more detail about it, like its path. We'll just add that to the query, run it again, and now we have the endpoint and its path, which will help us later. We can document it as much as needed following the API reference, and then go ahead and write a mock test for it. Now, in the interest of time, we won't write a brand new test. Instead, we'll go through one that we've already written that has actually already been merged as an ETE test. The start of it, we would create an outline showing our intent. In this case, we're going to do a life cycle test, going through the creation, the patching, the getting, the listing of a deployment, and then the deletion. So here is the function itself, the mock test written into the document. And we can actually run it from here. And as we run it, if we switch over to run our kubectl, we can see that it is firing on our cluster, going through what we expected to do. But it is also going to be summarizing the results back into the document, so we can verify that it did what we intended. Now, we were trying to target specific endpoints, and we want to see if it did do that, which we can do, again, using API SNU. So because this is connected, API SNU is connected to the cluster and is grabbing all the audit events as they happen, the test that we just wrote, had a user agent test it for live test writing. We can filter to that user agent so we can only see the endpoints hit by this test, which we'll do in this query. There we can see that we hit a number of deployment endpoints as we intended, and we can get the projected change in coverage. This will get us a nice point value to see the number of new endpoints we'd hit if this test were to be merged. But because we're using an existing test, it's showing that there would be no change in number, as we'd expect. But when this was completed, we would then export this entire document into GitHub flavored markdown, which we can then copy and open up as a ticket in the Kubernetes repo. And actually, we can go ahead and switch back over to the browser, because this entire flow that you saw was being used to create this ticket that we see here. And as you see, the org file we moved through became the body of this GitHub ticket. This ticket was then converted or used as a basis for a PR for an actual EDE test that was then merged and is a part of the conformance suite. Thank you for that. So that was our demo for our test running, one way to do test running. Once we are starting to close those gaps, we also want to prevent them from happening. Absolutely. So yeah, we can prevent gaps as well as fill them. In order to prevent them, we need to establish a healthy baseline. What that means is any endpoint that's promoted to GA must have a conformance test. All certified clusters will be expected to pass all conformance tests. The process is very automated, and so it should be easy to follow. On testgrid.kates.io, we see the jobs that help automate the process of the Kates community. This includes testing. Here we see two jobs, conformance audit and conformance gate. These are both on by Sig Arch. So in our earlier example with Sonobui, we ran an incomplete test suite. Now, if we were to submit that as a PR, including those results, we'd want to quickly know if those results were actually the entire suite. Otherwise, it would not be valid. We can automate that auditing of the results with the conformance audit job. And so what it would do is let's say that we did submit it as a PR, which we'll see here. When we submitted it, the job would run and then through a friendly CNCFCI bot would tell us that this conformance request did not include all of the required tests for that version of Kubernetes. And it gives us the first test that I could find that did not pass or was not included to help us narrow it down. It also adds a label of required tests missing. So this will make it easier to filter out or find quickly. And you'll see that is a part of a number of labels, each of them coming from different processes or jobs. All intended to make it easier to manage and track the various amount of activity happening through the various GitHub repos. The next is our conformance gate. This uses API SNP under the hood. It alerts when endpoints are promoted to GA without conformance tests. There is a description shown in the YAML right here, and ultimately it's intended to be a release blocking job. The job manifest has email for alerts. When alerted, we will be able to tell SIG release of an untested endpoint. This endpoint will need a test or it will be not included in the next release. So in summary, to improve the Kubernetes coverage, we want to identify any gaps using API SNP and SNPDB, close those gaps through the writing and promotion of tests, and prevent new ones from forming with release blocking jobs. As we said, conformance ensures that every vendor's version of Kubernetes supports the required APIs as do the open source community versions. There is a strong ecosystem around conformance supporting all levels of involvement with Kubernetes, from vendors to users to developers. Thanks for being a part of this presentation. Shall we go with the Q&A? I think we shall. See you there.