 Hello and welcome to contributing to Kubernetes Conformance coverage. I'm with IEI and we are a coding cooperative in New Zealand with a focus on cooperative coding. Pairing is sharing for us and anytime two or more of us are together cooperating on things than where IEI is there, right? So the people of the co-oper myself and my friend Caleb who is co-hosting this talk with me and our friend SAC who down in Wellington focuses on the UI and the database and many other pieces. Stephen Haywood does a lot of test writing and the client hands brothers separated by continents during COVID are collaborating on the prowl and the project management. We're going to focus on two major components, the intro and the deep dive. And in the intro, we're going to ask some questions about what Kubernetes Conformance is and how to participate and wrap up at the end by showing conformance submission passing. First question you might have is what is Kubernetes Conformance? Great question. It's good to have shared expectations about the Kubernetes API so it can behave the same way regardless of who's hosting it for you. I know it's important to me to have my workloads run everywhere that I see the Kubernetes logo. Luckily we have a conformance website that walks everyone through the process of understanding what it is that Kubernetes certification provides and how to participate. Our deep dive will go a little further into that. During this intro, we're going to go through the conformance repo and create a new folder underneath one of these starting back at 1.9 through 1.19. Though we only I think allow the last few releases to be certified as long as they're still supported. Why is Kubernetes Conformance important? Because vendors shouldn't matter about workloads. That's fair. I would expect my stuff to run anywhere regardless of vendor. The Kubernetes Conformance program kind of ensures this for us. But now that you've got these case expectations, who might be able to meet them for you? Yeah. Oh, it was about 67 or more. Quite a few. That's great. C&CM actually has a landscape where you can go to the website and click on the certified case providers link on the left and look at this long list of providers that can meet your case expectations. If you're wanting to get on that list, you might be asking, well, how do I certify my case distribution? Great question. So there's some public instructions on this if this is you. So what you'll need is four different files, one of them being documentation on how to make this reproducible. There'll be some product metadata and then there'll be two types of logs that you need. Why don't we go through and get some stuff set on? I like that. We have the readme from the kind SIG that submitted their four files about seven months ago for 118. And their documentation includes how to run it. So we'll go ahead and change our view to include that and walk through the steps for them. So we've just created a new Sonoboy. So we're going to look at the logs for that as well. And the logs will allow us to probably see that test running. And we can see the set up the pod, creating the watch, deleting the pods and verifying. So all those verification steps were run as part of the Sonoboy test suite and they will create some output for us. But while we're waiting for the output, maybe we could look at the Coup Cuddle, the Sonoboy namespace. Let's look and see what's there in the watch. You can note that they're the Sonoboy pod running along with all the normal expedited environments, including, oh, this API snippet that might come in port later. Yeah, remember that. Sonoboy will export some results, including the two log files that you will need. And we'll do a quick check to make sure that those log files do exist and they do excellent. So combining the files we need to submit our conformance results, the instructions that they provide, and I will go ahead and focus on the website here that shows uploading the files and the fields that you'll need to fill in your product panel that will be part of our PR that we upload. In order to create that PR, we're going to need to go back into our terminal window and show our forking of the case conformance repository, as well as adding a remote to push our changes to. The branch doesn't really matter, but I'm usually basing the name of your product and the version, because that will be the results that we do submit. The next step is to copy those results into place and combine them with your readme and your product YAML. And if we push those through, that will allow us to create a branch to create a PR. So this is the example PR that includes our not-kind version with some URL and information for our logo, as well as our readme that doesn't yet include nice instructions and those two logs. So if we click on create pull request, there's a pre-submission checklist that you can go through that's already filled out for case conformance. But for our demo here, I'm actually going to file this against the CNCF infrared case conformance and go ahead and look at the pull requests that we're creating based on that. We'll come back to this a little bit later in the demo, but this contents of the PR contain these four files that are part of the documentation and we'll visit that again after our deep dive. Thanks for that, Caleb. Our deep dive is going to consist of several things regarding the gaps that we have in Kubernetes conformance coverage. Three main things, identifying those gaps, closing those gaps, and preventing the gaps so we can color in all of the parts that are missing. First, identifying the gaps in Kubernetes coverage. It's going to require filling in our, thank you, filling in our graph here. apxnuke.cncf.io is built on top of a database that looks at the entire surface area of the Kubernetes API and lets us see the operations that have conformance coverage, which is the darker color, and test coverage that's not yet conformance, which is a lighter color, and the gray areas which have no test coverage at all. You can see that there are quite a few gaps we'd like to fill in, particularly in the green stable area. These are the areas that we focus on as far as the gaps. The underlying database that we use to create that graph is called SnoopDB, and it has a few schemas that are populated from public datasets. First of all, we pull in the swagger JSON that is from the Kubernetes repository, usually for the branch that we're focused on getting data for, and conformance CI jobs that we have to configure to generate text JSON file logs that are including policies logging all the events we're interested in auditing. The last part is some testing schemas that will help us when we're trying to write test or get data in a live cluster situation. You might be asking yourself how I can ask and answer my own questions deploying SnoopDB. Great question. Well, since there's actually many ways to deploy API Snoop, you can use your own clusters, or if you want to do some local testing with it and figure out the endpoints that you care about, then you can deploy it with kind using this configuration that we have. If you clone the API Snoop repo, check out the kind folder. There's a neat little configuration that just might be for you, which will bring up API Snoop and auto logos straight away. What's great about having this local to you is you can ask any of the questions we're asking in the DB and modify them to ask your own questions and get results without having to wait for us to come up with a solution. The schemas that we're going to focus on first are the audit events and the open API. The open API is pulled from that swagger JSON that's in the Kubernetes repository. I think it weighs in at around two megs normally, but once it's loaded into the database with all of the indexes and extra data it weighs in at around five megs. On the other hand, the audit event table is pulling from probably two to five gigs depending on how many CIOs you're pulling from, but tends to consolidate down fairly nicely because of the amount of overlap in fields. We load the swagger JSON directly from the GitHub repository for Kubernetes and this allows us to have a table based on just the shape of the Kubernetes API to be really clear and precise about what it is we're trying to make sure that we cover. Without having any logs, this allows us to do, as you could, an example PSQL query that focuses on the release upcoming called 120 and the stable endpoints to just show us what's brand new and stable in this next release. Let's keep an eye on this get internal API server group because it'll be important later in the presentation when we have some UI elements to show. In addition to the surface area, we have this big gray underlined thing we need to color in the areas that we are actually testing. SnoopDB has a second phase it goes through instead of beyond loading just the API definition. It also loads test data. The kind conformance audit job was created this week in order to create an audit log. You can see that the audit log that it creates there is brought in to our API SnoopDB so we can have all of these tests and see all the endpoints that it hits. The database table underlined that that ingest those logs allows us to know and query the test whether that test was a conformance test or not and all of the raw JSON data that is part of that audit log entry. So we have access to everything that's available in the Kubernetes audit logs to query and combine with the surface area. As of this morning we were only using a couple of jobs to inform our coverage and that was the CI Kubernetes GCE conformance latest and the EDE GCI GCE job. If you look tomorrow it'll include the kind job as well. Those are all available publicly on Prout so when you load up SnoopDB it will retrieve them and put them into your local instance of SnoopDB. We had to create some changes to the EDE test framework and the API server in order to have our tests show up in these audit logs. First of all we had to ensure that the EDE framework submitted a user agent that changed based on the current context of the Ginkgo test that it was executing. That allowed the user agent to be transmitted and picked up by API server which previously did not include the user agent in its logs so our changes there enabled the user agent to be written all the way through so we could pick it up in the database. The conformance tests are now able to be queried by just looking through the audit event table and finding where we have unique tests that have conformance for example. If you remember earlier we had that give us the new endpoint in 120 that's stable. This is the front page of API Snoop but if you go down to the bottom of any of the releases that you can select from you'll see that list of new endpoints. It's pretty obvious that there is a new category being brought in in 120 called internal API server which has a slew of alpha endpoints which will need to eventually as they progress from alpha to beta to stable to reach stable will need conformance tests but for now this allows us to have some anticipation and communication with them probably with whatever stick is part of internal API server to let them know hey we're going to need to have conformance tests before it's part of the public release of Kubernetes. Historically we have about three years of the conformance program it started back in 19 and we began to label in text the conformance tag for our test and we gained in quite a few red areas anytime you see red that's new conformance test that take care of old gray debt. The orange areas are where we introduced new endpoints and didn't include any test. This makes it hard as it makes the hole we're digging deeper while we're trying to fill it in. So we were lucky in that from 115 onward we don't really see a lot of new endpoints being promoted without tests and we'll show a little bit later where we're ensuring that that will never happen again. The red is where we have written tests for old endpoints and erase that debt and we color in that gray area with red and that reduces our gaps in coverage. The last slide around identifying is showing our current conformance debt all the way back to 115. We hope to clear all of our debt back to 111 by the time we cut 120. Finally we're going to go through closing gaps in Kubernetes conformance coverage and I'm going to turn that over to Caleb. Yeah, so we've got this flow for going through and finding those endpoints that need those tests and closing that as a gap. So we start off with this query to focus on the specific untested endpoints. Here we're searching for five stable core endpoints that are eligible for conformance but are lacking in tests. Once we get the endpoint we want to go to the reference docs to understand the API endpoint and shout out to SIG docs doing great work. We want to understand the way that we can talk to the resource and all of its function handlers so we go to client go for that which is really useful and we look in the core v1 folder. Here's an outline of the test and the way that we'll write the test. So this is often a life cycle of the resource. We use this quite often because we want to hit more endpoints. This allows discussion of the approach of the test without needing to write out a fully fleshed test or even any mock test, just high level discussions for the conformance sub-project. Once we've gone to that point in the same ticket we're able to show an example because at this point we don't want to use the E2E test suite because we want to be able to display what we want to do as tickets to make it really easy to discuss. So that is all before creating a PR. And then next on we run mock tests in our local clusters and we make sure that our user agent is set to live test writing. This allows us to see the new untested endpoints which we want to target in our tests. And then following on we've noticed that this test wasn't effective enough creating a pod because pods are already covered in conformance. So we don't get any changing coverage there, which is fine. So we'll look again for more endpoints later on. Now that we've got a ticket, by this point you should have endpoints which unhit and you've proven that you can hit them and you've gotten some code that actually does do that. Now we can submit a ticket and in this screenshot right here we have a whole bunch of issues on GitHub that submitted as tickets and they're exported as mockdown and they're all ready for review. So that's what it looks like to find those endpoints which haven't been had to create a test for it. It's great to have a team focused on filling in all of those gaps. I know that Caleb, myself, and Steven and a few people within the team have been able to write those tests and fill it in. But the next thing we do is making sure that our preventing new gaps from forming in Kubernetes coverage. We work a bit with the testing and proud infrastructure and particularly test grid and creating some dashboards for the Sigarch performance subgroup of Sigarch and that allows us to have proud jobs that focus on conformance in order to do two things. Our conformance audit is the new proud job that generates audit logs but we also have the conformance gate and the API's new conformance gate job that will be sending emails to the group of folks interested in conformance test failures so that we can eventually filter this to a release blocking conformance job so that we can signal to the release, Sig release that there is a new API that has come through without conformance test. We do that so that there's time to either revert that back into beta or ensure that the tests are fully written in conformance before that new API can be part of a release. To summarize our deep dive was all about the gaps in Kubernetes conformance coverage where we would identify the gaps using apisnuke.cncf.io with the underlying SnoopDB. We were able to close those gaps using humax and our encluster workflow that the IIT uses and we were able to prevent gaps by creating our release blocking jobs. So let's back out of the deep dive for a moment to verifying the community's conformance submissions. We're using a bit of prow.cncf.io for that. Note that that is thecncf's prow instance versus the Kubernetes community's instance. But this is about our PR submission from earlier. The results get submitted to thecncf, Kate's conformance repository. And initially we were reviewed by humans and it's a lot of work and we wanted to make sure that we validated that it was ready for just a thumbs up approval that had metadata around it. And our bot that is powered by a prow plugin that we wrote goes through and validates that the title and the logs and the tests are all following the protocols and they're all run. In this case we have a required test missing label. And that's because in order to speed things up the test that we ran earlier for Sonoboy against our not quite kind deployment was only for one test. So obviously it didn't pass. The communication might be please try again and here's the directions for running it. But fear not many other certified distributions have been successfully adding the label of the test verified for the release they were interested in which allows thecncf to go through and approve those merges and let you to use the certified logo for Kubernetes. That's pretty much it. Here's the main links that we talked about during this talk for thecncf certified Kubernetes repository, the API Snoop website, the test grid, the two repositories within thecncf for submitting your conformance results and the work that the API Snoop team does. We'll open it up to Q&A now and Caleb and I will see you in the talk.